Search CORE

111 research outputs found

Extending OLAP Querying to External Object

Author: Gu Junmin
Jensen Christian Søndergaard
Pedersen Torben Bach
Shoshani Arie
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2000
Field of study

VBN

Recommended from our members

Performances of Multi-Level and Multi-Component Compressed BitmapIndices

Author: Shoshani Arie
Stockinger Kurt
Wu Kesheng
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 30/04/2007
Field of study

This paper presents a systematic study of two large subsetsof bitmap indexing methods that use multi-component and multi-levelencodings. Earlier studies on bitmap indexes are either empirical or foruncompressed versions only. Since most of bitmap indexes in use arecompressed, we set out to study the performance characteristics of thesecompressed indexes. To make the analyses manageable, we choose to use aparticularly simple, but efficient, compression method called theWord-Aligned Hybrid (WAH) code. Using this compression method, a numberof bitmap indexes are shown to be optimal because their worst-case timecomplexities for answering a query is a linear function of the number ofhits. Since compressed bitmap indexes behave drastically different fromuncompressed ones, our analyses also lead to a number of new methods thatare much more efficient than commonly used ones. As a validation for theanalyses, we implement a number of the best methods and measure theirperformance against well-known indexes. The fastest new methods arepredicted and observed to be 5 to 10 times faster than well-knownindexing methods

UNT Digital Library

Grid collector: an event catalog with automated file management

Author: Gu Junmin
Shoshani Arie
Sim Alexander
Wu Kesheng
Zhang Wei-Ming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides ''direct'' access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, ''production date between March 10 and 20, and the number of charged tracks > 100.'' The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large variety of users

Crossref

eScholarship - University of California

UNT Digital Library

Parallel in situ indexing for data-intensive computing

Author: Abbasi Hasan
Chacon Luis
Docan Ciprian
Kim Jinoh
Klasky Scott
Liu Qing
Podhorszki Norbert
Shoshani Arie
Wu Kesheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2011
Field of study

As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings

Crossref

UNT Digital Library

Storage Resource Manager version 2.2: design, implementation, and testing experience

Author: Abadie Lanna
Badino Paolo
Baud J P
Corso Ezio
De Witt Shaun
Donno Flavia
Fuhrmann Patrick
Gu Junmin
Koblitz Birger
Lemaitre Sophie
Litmaath Maarten
Litvintsev Dimitry
Lo Presti Giuseppe
Magnoni Luca
McCance Gavin
Mkrtchan Tigran
Mollon Rémi
Natarajan Vijaya
Perelmutov Timur
Petravick Don
Shoshani Arie
Sim Alex
Smith David
Tedesco Paolo
Zappi Riccardo
Publication venue: 'IOP Publishing'
Publication date: 31/10/2007
Field of study

Storage Services are crucial components of the Worldwide LHC Computing Grid Infrastructure spanning more than 200 sites and serving computing and storage resources to the High Energy Physics LHC communities. Up to tens of Petabytes of data are collected every year by the four LHC experiments at CERN. To process these large data volumes it is important to establish a protocol and a very efficient interface to the various storage solutions adopted by the WLCG sites. In this work we report on the experience acquired during the definition of the Storage Resource Manager v2.2 protocol. In particular, we focus on the study performed to enhance the interface and make it suitable for use by the WLCG communities. At the moment 5 different storage solutions implement the SRM v2.2 interface: BeStMan (LBNL), CASTOR (CERN and RAL), dCache (DESY and FNAL), DPM (CERN), and StoRM (INFN and ICTP). After a detailed inside review of the protocol, various test suites have been written identifying the most effective set of tests: the S2 test suite from CERN and the SRM-Tester test suite from LBNL. Such test suites have helped verifying the consistency and coherence of the proposed protocol and validating existing implementations. We conclude our work describing the results achieved

CERN Document Server

Finding regions of interest on toroidal meshes

Crossref

Recommended from our members

CABLE: A language based on the entity-relationship model

Author: Shoshani Arie
Publication venue: eScholarship, University of California
Publication date: 01/01/1978
Field of study

eScholarship - University of California

The Scientific Data Management Center

Author: Shoshani Arie
Publication venue: eScholarship, University of California
Publication date: 30/06/2006
Field of study

With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive, end-to-end data management solutions ranging from initial data acquisition to final analysis and visualization. The Scientific Data Management (SDM) Center is bringing a set of advanced data management technologies to DOE scientists in various application domains including astrophysics, climate, fusion, and biology. Equally important, it has established collaborations with these scientists to better understand their science as well as their forthcoming data management and data analytics challenges. The SDM center has provided advanced data management technologies to DOE domain scientists in the areas of storage efficient access, data mining and analysis, and scientific process automation

Ezid

eScholarship - University of California

UNT Digital Library

Recommended from our members